Recent years have witnessed the rapid progress of image captioning. However, the demands for large memory storage and heavy computational burden prevent these captioning models from being deployed on mobile devices. The main obstacles lie in the heavyweight visual feature extractors (i.e., object detectors) and complicated cross-modal fusion networks. To this end, we propose LightCap, a lightweight image captioner for resource-limited devices. The core design is built on the recent CLIP model for efficient image captioning. To be specific, on the one hand, we leverage the CLIP model to extract the compact grid features without relying on the time-consuming object detectors. On the other hand, we transfer the image-text retrieval design of CLIP to image captioning scenarios by devising a novel visual concept extractor and a cross-modal modulator. We further optimize the cross-modal fusion model and parallel prediction heads via sequential and ensemble distillations. With the carefully designed architecture, our model merely contains 40M parameters, saving the model size by more than 75% and the FLOPs by more than 98% in comparison with the current state-of-the-art methods. In spite of the low capacity, our model still exhibits state-of-the-art performance on prevalent datasets, e.g., 136.6 CIDEr on COCO Karpathy test split. Testing on the smartphone with only a single CPU, the proposed LightCap exhibits a fast inference speed of 188ms per image, which is ready for practical applications.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Interoperability issue is a significant problem in Building Information Modeling (BIM). Object type, as a kind of critical semantic information needed in multiple BIM applications like scan-to-BIM and code compliance checking, also suffers when exchanging BIM data or creating models using software of other domains. It can be supplemented using deep learning. Current deep learning methods mainly learn from the shape information of BIM objects for classification, leaving relational information inherent in the BIM context unused. To address this issue, we introduce a two-branch geometric-relational deep learning framework. It boosts previous geometric classification methods with relational information. We also present a BIM object dataset IFCNet++, which contains both geometric and relational information about the objects. Experiments show that our framework can be flexibly adapted to different geometric methods. And relational features do act as a bonus to general geometric learning methods, obviously improving their classification performance, thus reducing the manual labor of checking models and improving the practical value of enriched BIM models.
translated by 谷歌翻译
Personalized Federated Learning (PFL) which collaboratively trains a federated model while considering local clients under privacy constraints has attracted much attention. Despite its popularity, it has been observed that existing PFL approaches result in sub-optimal solutions when the joint distribution among local clients diverges. To address this issue, we present Federated Modular Network (FedMN), a novel PFL approach that adaptively selects sub-modules from a module pool to assemble heterogeneous neural architectures for different clients. FedMN adopts a light-weighted routing hypernetwork to model the joint distribution on each client and produce the personalized selection of the module blocks for each client. To reduce the communication burden in existing FL, we develop an efficient way to interact between the clients and the server. We conduct extensive experiments on the real-world test beds and the results show both the effectiveness and efficiency of the proposed FedMN over the baselines.
translated by 谷歌翻译
联合学习(FL)是一个分布式的机器学习框架,可以减轻数据孤岛,在该筒仓中,分散的客户在不共享其私人数据的情况下协作学习全球模型。但是,客户的非独立且相同分布的(非IID)数据对训练有素的模型产生了负面影响,并且具有不同本地更新的客户可能会在每个通信回合中对本地梯度造成巨大差距。在本文中,我们提出了一种联合矢量平均(FedVeca)方法来解决上述非IID数据问题。具体而言,我们为与本地梯度相关的全球模型设定了一个新的目标。局部梯度定义为具有步长和方向的双向向量,其中步长为局部更新的数量,并且根据我们的定义将方向分为正和负。在FedVeca中,方向受步尺的影响,因此我们平均双向向量,以降低不同步骤尺寸的效果。然后,我们理论上分析了步骤大小与全球目标之间的关系,并在每个通信循环的步骤大小上获得上限。基于上限,我们为服务器和客户端设计了一种算法,以自适应调整使目标接近最佳的步骤大小。最后,我们通过构建原型系统对不同数据集,模型和场景进行实验,实验结果证明了FedVeca方法的有效性和效率。
translated by 谷歌翻译
上下文:大数据的有效处理是SQL和NOSQL数据库的一项具有挑战性的任务,在这种数据库中,有效的软件体系结构起着至关重要的作用。 SQL数据库设计用于构建数据和支持垂直可扩展性。相反,水平可伸缩性由NOSQL数据库支持,并且可以有效地处理较大的非结构化数据。可以根据组织的需求选择正确的范式;但是,做出正确的选择通常可能具有挑战性。 SQL和NOSQL数据库遵循不同的体系结构。同样,混合模型之后是NOSQL数据库的每个类别。因此,对于多个云服务提供商(CSP)的云消费者来说,数据移动变得困难。此外,每个云平台IAAS,PAAS,SaaS和DBAAS还监视各种范式。目的:该系统文献综述(SLR)旨在研究与SQL和NOSQL数据库软件体系结构相关的相关文章,并解决各种云平台之间的数据可移植性和互操作性。最新的状态通过观察缩放,性能,可用性,一致性和分片特性,介绍了SQL和NOSQL数据库的许多性能比较研究。根据研究研究,NOSQL数据库设计的结构可以是大数据分析的正确选择,而SQL数据库适合OLTP数据库。研究人员提出了许多与云中数据流动相关的方法。开发了基于平台的API,这使用户的数据移动变得困难。因此,在跨多个CSP的数据移动期间发现了数据可移植性和互操作性问题。为了最大程度地减少开发人员的努力和互操作性,要求统一的API使数据移动在各种云平台之间相对易于访问。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
传统的像素图像攻击算法对防御算法的鲁棒性不佳,即应用防御算法时的攻击强度急剧下降。尽管生成对抗网络(GAN)可以通过综合更有意义的纹理模式来部分解决此问题,但主要限制是现有生成器只能生成特定比例的图像。在本文中,我们提出了一种基于无规模的攻击算法,该算法将全球具有语义上有意义的对抗模式综合到具有任意尺度的图像。我们的生成攻击方法始终优于各种攻击设置上的最新方法,即所提出的方法在很大程度上降低了各种图像分类,对象检测和实例分段算法在不同的高级防御方法下的性能。
translated by 谷歌翻译
稀疏奖励学习通常在加强学习(RL)方面效率低下。 Hindsight Experience重播(她)已显示出一种有效的解决方案,可以处理低样本效率,这是由于目标重新标记而导致的稀疏奖励效率。但是,她仍然有一个隐含的虚拟阳性稀疏奖励问题,这是由于实现目标而引起的,尤其是对于机器人操纵任务而言。为了解决这个问题,我们提出了一种新型的无模型连续RL算法,称为Relay-HER(RHER)。提出的方法首先分解并重新布置原始的长马任务,以增量复杂性为新的子任务。随后,多任务网络旨在以复杂性的上升顺序学习子任务。为了解决虚拟阳性的稀疏奖励问题,我们提出了一种随机混合的探索策略(RME),在该策略中,在复杂性较低的人的指导下,较高复杂性的子任务的实现目标很快就会改变。实验结果表明,在五个典型的机器人操纵任务中,与香草盖相比,RHER样品效率的显着提高,包括Push,Pickandplace,抽屉,插入物和InstaclePush。提出的RHER方法还应用于从头开始的物理机器人上的接触式推送任务,成功率仅使用250集达到10/10。
translated by 谷歌翻译
室内多机器人通信面临两个关键挑战:一个是由堵塞(例如墙壁)引起的严重信号强度降解,另一个是由机器人移动性引起的动态环境。为了解决这些问题,我们考虑可重构的智能表面(RIS)来克服信号阻塞并协助多个机器人之间的轨迹设计。同时,采用了非正交的多重访问(NOMA)来应对频谱的稀缺并增强机器人的连通性。考虑到机器人的电池能力有限,我们旨在通过共同优化接入点(AP)的发射功率,RIS的相移和机器人的轨迹来最大化能源效率。开发了一种新颖的联邦深入强化学习(F-DRL)方法,以通过一个动态的长期目标解决这个具有挑战性的问题。通过每个机器人规划其路径和下行链路功率,AP只需要确定RIS的相移,这可以大大保存由于训练维度降低而导致的计算开销。仿真结果揭示了以下发现:i)与集中式DRL相比,提出的F-DRL可以减少至少86%的收敛时间; ii)设计的算法可以适应越来越多的机器人; iii)与传统的基于OMA的基准相比,NOMA增强方案可以实现更高的能源效率。
translated by 谷歌翻译